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1 Introduction 

As addition is commutative but subtraction is not, the set of sums 

S + S := {si + S2 : s, G S} 
of a finite set S is predisposed to be smaller than the set of differences 

S — S := {si — S2 - Si G S}. 
As Nathanson |6| wrote: 

"Even though there exist sets A that have more sums than differences, such sets 
should be rare, and it must be true with the right way of counting that the vast 
majority of sets satisfies \A — A\ > \ A + A\." 

Following this reasoning, one would suspect that a vanishingly small proportion of the 2" 
subsets of {0, 1,2, ... ,n — 1} have more sums than differences. Our purpose, however, is 
to show that this is not the case. The following terminology will be used throughout this 
article: 

Definition. A finite set S is difference-dominant if |S — S| > \S + S|, sum-dominant if |S + S| > 
|S — S\, and sum-difference-balanced ii\S + S\ = \S — S\. 

Nathanson (7| calls sum-dominant sets "MSTD" sets, short for "More Sums Than Differ- 
ences". We refer the reader to |M7| for the history of this problem. 

Our main theorem shows that, perhaps contrary to intuition, all three types of set in 
the above definition are ubiquitous. 
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Theorem 1. Let P be any arithmetic progression of length n. A positive proportion of the subsets 
ofP are difference-dominant, a positive proportion are sum-dominant, and a positive proportion are 
sum-difference-balanced. More precisely, there exists c > such that for all n > 15, 

#{S C P : S is difference-dominant] > c2 n , 

#{S C P : S is sum-dominant] > c2 n , 

#{S C P : S is sum-difference-balanced] > c2 n . 



We observe that the sizes of S + S and S — S are invariant under translation and dilation of 
S, so that without loss of generality we can restrict our attention to P = {0, 1, 2, ... ,n — 1}. 
The following examples show that none of the three categories is empty for n > 15: 

Example. The set S = {0, 1,3} has S + S = {0,1,2,3,4,6} andS-S = {-3,-2,-1,0,1,2,3}; 
therefore S is difference-dominant, since \S — S\ = 7>6 = | S + S \ . 

Example. The set S = {0, 2, 3, 4, 7, 11, 12, 14} has S + S = {0, . . . , 28} \ {1, 20, 27} and S - 
S = {—14, . . . , 14} \ { — 13, —6,6, 13}; therefore S is sum-dominant, since \S + S| = 26 > 
25 = |S-S|. 

Example. A set S is symmetric if S = a* — S for some a* £ R. Any symmetric set has 
S + S = S + (a* — S) = a* + (S — S); therefore symmetric sets are sum-difference-balanced. 
In particular, any interval or arithmetic progression is sum-difference-balanced. 

The idea behind Theorem [T] is the following. Most subsets of {0, 1, 2, ...,« — 1} have 
about n 12 elements; call our typical subset S. Each k S {0, 1, 2, . . . , 2n — 2} has, on average, 
roughly n/4 — \n — fc|/4 representations as a sum of two elements of S. Not only is this 
positive, it is quite large except when k is near or 2n — 2. Similarly, each nonzero k G 
{ — (n — 1), . . . ,n — 1} has, on average, roughly n/4 — \k\ /4 representations as a difference 
of two elements of S. Not only is this positive, it is quite large except when \k\ is near 
n — 1. Putting these together, the sizes of the sumset and difference set are predominantly 
affected by the elements of S that are near or near n. If we choose the "fringe" of S 
cleverly, the middle of S will become largely irrelevant. 

This philosophy suggests the following conjecture; see Section [7] for a more refined 
conjecture. 

Conjecture 2. Let P be any arithmetic progression with length n. The limiting proportions 

lim 2~"#{S <ZP:Sis difference-dominant] 
lim 2~"#{S C P : S is sum-dominant] 
lim 2~"#{S C P : S is sum-difference-balanced] 

all exist and are positive. 

The following result, on the other hand, supports Nathanson's instinct as quoted 
above, with one interpretation of "the right way" and a suitably humble understanding 
of "vast". Theorem |3] is proved in Section HI 
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Theorem 3. Let P be any arithmetic -progression with length n. On average, the difference set of a 
subset ofP has 4 more elements than its sumset. More precisely, 

— y\S-S\~2n-7, 

^ SCP 

— Y IS H- SI ~2n-ll. 

SCP 



Nathanson asks for the possible values of | A + A \ — \ A — A \ . We show by con- 
struction in Section|5]that the range of | A + A \ — \ A — A \ is Z; in fact our constructions are 
economical, in the sense of the following theorem, which is the subject of Section|5j 

Theorem 4. For every integer x, there is a set S C {0, 1, . . . , Y7\x\ } with |S + S| — \S — S\ = x. 

Acknowledgements. The first author was supported in part by grants from the Natural 
Sciences and Engineering Research Council. The second author was supported in part by 
a grant from The City University of New York PSC-CUNY Research Award Program. The 
second author also acknowledges helpful discussions with Natella V. O'Bryant. 



2 Sums and differences in randomly chosen sets 

In this section, we establish several ancillary results on the probabilities that particular 
sums and differences are present or absent in sets chosen randomly from certain classes of 
sets. We will consider in particular the following classes: Let n, £, and u be integers with 
n > £ + u. Fix L C {0, ... ,1 — 1} and U C {n — u, . . . , n — 1}. We will consider the set of all 
subsets A C {0, . . . , n - 1} satisfying An {0, ...,£- 1} = L and A Ci {n - u, . . . ,n - 1} = 
U as a probability space endowed with the uniform probability, where each such set A 
occurs with the probability 2~("~^~"). 

All of the calculations in this section are straightforward, but the details depend upon 
the size and sometimes the parity of the particular sum or difference we are investigating, 
and so the lemmas herein are rather ugly. The reader with limited tolerance could scan 
Propositions [8] and [12] and move on to the next section without significantly interrupting 
the flow of ideas. 

We begin with three lemmas describing the probabilities of particular sums missing 
from A + A, where A is chosen randomly from a class of the type indicated above. 

Lemma 5. Let n, I, and u be integers with n > £ + u. Fix L C {0, ... ,1 — 1} and U C 
{n — u, . . . , n — 1}. Suppose that R is a uniformly randomly chosen subset of {£, . . . , n — u — 1}, 
and set A := L U R U U. Then for any integer k satisfying 2£ — l < k < n — u — 1, the probability 



2) 

i i /■ /-> / 

j. , if k is even. 



Proof. Define random variables Xj by setting Xj = 1 if j £ A and Xj = otherwise. By the 
definition of A, the variables Xj are independent random variables for £ < j < n — u — 1, 
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each taking the values and 1 with probability 1/2 each, while the variables Xj for < 
j < I — 1 and n — u < j < n — 1 have values that are fixed by the choices of L and U. 

We have k £ A + A if and only if X ; X, C _ ; = for all < < k/2; the key point is that 
these variables XjX k -j are independent of one another. Therefore 

F[kgA + A} = Yl p [ x i x k-j = 0] • 

0<j<k/2 

If k is odd, this becomes 

l-\ (k-l)/2 

F[kg A + A] =HF[X } X k -j = 0] n F [XyX M = 0] 
7=0 /=/ 
(k-l)/2 

= Y[F[X k _j = 0] Yl P [X y = or X H = 0] 

jel j=t 

= ml L lm(*+i)/2-* 
On the other hand, if /: is even then 

f-1 //c/2-1 \ 

P[/c^A + A]=niP [X;X fc _ ; - = 0] n P [XjX M = 0] P [X fc/2 X fc/2 = 0] 

;=0 V }=£ / 

/*/2-l \ 

= n p [ x 'c-; = °] II p t x ; = or X M = 0] P [X fc/2 = 0] 
jel n / 

□ 



27 V 4/ 2- 



Lemma 6. Le£ n, £, and u be integers with n > £ + u. Fix L C {0, ... ,1 — 1} and U C 
{n — u, . . . , n — 1}. Suppose that R is a uniformly randomly chosen subset of {£, . . . ,n — u — 1}, 
and se£ A := L U R U U. Then for any integer k satisfying n + £ — 1 < k < 2n — 2u — 1, the 
probability 



F[k A + A] 



■^\U\^n-W)/*-U f ifkisodd> 
{ ^\u\ + l^ ) n-l-m-u ) ifkisevm 



Proof. This follows from Lemma |5] applied to the parameters £' = u and V = n — 1 — II, 
u' = £ and W = n - 1 - L, and A' = n - 1 - A and k' = 2n-2-k. □ 

Lemma 7. Suppose that A is a uniformly randomly chosen subset of {0, . . . ,n — 1}. Then for any 
integer < k < n — 1, the probability 



P [k £ A + A] 



(|) (m)/2 , if k is odd, 
2(3) / if k is even; 
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while for any integer n — 1 < k < 2n — 2, the probability 

. 3 ,n-(k+l)/2 



P [k £ A + A] 



(f)*-^^ if k is odd, 



(|J , if k is even. 
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Proof. This follows immediately from Lemmas [5] and [6] upon setting £ = u = and L = 
U = @. ' □ 

We now use these lemmas to establish the following proposition, in which we want a 
positive probability that many integers k appear in the sumset A + A. While these events, 
varying over k, are not independent, we need only a lower bound on the probability; hence 
it suffices to combine crudely the exact probabilities given in Lemmas |5] and |6] We em- 
phasize that we have made no effort to optimize the lower bound given in the following 
proposition. 

Proposition 8. Let n, £, and u be integers with n > I + u. Fix L C {0, ...,£ — 1} and U C 
{n — u, . . . ,n — 1}. Suppose that R is a uniformly randomly chosen subset of {£, . . . ,n — u — 1}, 
and set A := L U RUU. Then the probability that 

{2£ - I, . . . ,n - u - 1} U {n + £ - I, . . . ,2n - 2u - 1} C A + A 

is greater than 1 - 6(2~I L I +2~l u l). 

Proof. We employ the crude inequality 

F [{2£ - 1, . . . ,n - u - 1} U {n + £ - 1, . . . ,2n - 2u - 1} £ A + A] 

n-u-l 2n-2u-l 

< £ F[k£A + A]+ £ F[k£A + A\. 

k=ze-i k=n+e-i 

The first sum can be bounded, using Lemma |5l by 

Tfn^ak £ (i) |L| (i) (w)/2 - £ + E G) iLi+1 (i)^ 

k=2£-l k>2£-l k>2i-\ 

k odd k even 



CO oo 

Q) |L| E(!r + G) |i|+1 E(i) w = 6Q) |L| . 



m=0 m=0 

The second sum can be bounded in a similar way using Lemma yielding 

2n-2u-\ 

£ P[^A + A]<6(i) |U| . 

k=n+i-l 

Therefore P \{2£ - 1, . . . , n - u - 1} U {n + £ - 1, . . . , In - 2u - 1} % A + A) is bounded above 
by 6(1/2)I L I + 6(l/2)' u ', which is equivalent to the statement of the proposition. □ 
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We turn now to three lemmas describing the probabilities that particular differences 
are missing from A — A, where A is chosen randomly from one of our classes. A new 
obstacle appears: while the random variables X;Xfc_y controlling the presence of the sum 
k in A + A are always mutually independent, the same is not true of the random vari- 
ables XjXk+j controlling the presence of the difference k in A — A, at least when k is small 
enough that j, k + j, and 2k + j can all lie between and n — 1. Fortunately, when k is this 
small the probabilities in question are already minuscule, so a simple argument provides 
a serviceable bound (Lemma llOlbelow) . 

Lemma 9. Let n, £, and u be integers with n > £ + u. Fix L C {0, ... ,1 — 1} and U C 
{n — u, . . . , n — 1}. Suppose that R is a uniformly randomly chosen subset of {£,... ,n — u — I}, 
and set A := L U R U U. Then for any integer k satisfying n / 2 < k < n — u — £, the probability 

p[^A-A] = (i) |L|+|U| (|) K ^- M - fc 

Proof. Define random variables X,- by setting Xj = 1 if / £ A and X,- = otherwise, as in the 
proof of Lemma |5J We have k £ A — A if and only if XjX k+ j = for all < ;' < n — 1 — k, 
and again these variables XjX^j are independent of one another. Therefore 

n-l-k 

F[k£A-A]= n P [ X i X k+j = 0] 
7=0 

= n p i x i x k+j = o] " ff ,Cp i x i x *+i = °] "ff p i x i x *+i = °] 

;'=0 ;'=^ j=n—u—k 

= YlF[X k+j = 0]" V[X y = 0orX t+/ = 0] f[ P [ X i = °] 

;'gL jeu-k 

= (}) |i| (!) B " / "" _fc a) |u| - 

□ 

Lemma 10. Let a and b be integers with a < b. Suppose that R is a uniformly randomly chosen 
subset of {a, . . . ,b — 1}. Then for any integer k satisfying 1 < k < 2(b — a)/3, the probability 

F[k<ZR-R]< {\f- a), \ 

Remark. In fact, the probability in question can be written exactly in terms of products of 
Fibonacci numbers: in the simplest case, P [1 ^ R — R] = F\,_ a +il2 b ~ a . However, the re- 
sulting expressions would become too tedious to handle in our applications below. When 
b — a is large and k is small, the actual value of the probability P [k £ R — R] is proportional 
to ((1 + \/b)/A) h - a - k « 0.809 fc - fl , whereas the bound in Lemma 1 gives (3/4)( b -") /3 w 
0.909 fc ~ fl . However, in the particular case k = (b — a) 12, the probability in question is 
exactly (3/4) C^")/ 2 rs 0.866 b ~ fl , so the bound in Lemma|9]is not too unreasonable. 

Proof. Define the set 

J := {a < j < b — k : |_^J is even}. 
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In other words, / contains the first k integers starting at a, then omits the following k inte- 
gers, then contains the next k integers, and so on until the upper bound a + 2(b — fl)/3is 
reached. The following properties of / can be easily verified: 

(i) if; G /,then; + fc £ }) 

(ii) |/|>(fc-a)/3. 

Now define random variables Xj by setting Xj = 1 if ;' G R and Xj = otherwise, as in the 
proof of Lemma [TOl We have k £ R — R if and only if X ; X /c+; - = for all a < j < b — k. 

P [k £ R - R] = P [XyXjt+y = for all a < j < b - k] 

< P [XjXk+j = for all ; G /] . 

However, property (i) above ensures that the random variables XjX k+ j are independent of 
one another as ;' ranges over /. Therefore 

p [* * r - r] < np [XjX k+j = o] = (D iji < af- a)/3 

by property (ii) above. □ 

Lemma 11. Suppose that A is a uniformly randomly chosen subset of {0, . . . ,n — 1}. Then for 
any integer 1 < k < n/2, the probability P [k <£. A — A] < (3/4) n/3 , while for any integer 
n/2 <k < n-1, the probability P [k £ A - A] = (3/4)"^. 

Proof. The first assertion follows immediately from Lemma [lOlupon setting a = and = 
n, while the second assertion follows immediately from Lemma [9] upon setting £ = u = 
and L = U = <Z>. ' □ 

We now use these lemmas to establish the following proposition, in which we want 
a positive probability that many integers k appear in the difference set A — A. Again it 
suffices to combine crudely the results of Lemmas and [KB since we need only a lower 
bound on the probability. Once again we have emphasized ease of exposition over opti- 
mization of the lower bound itself; in particular, we could have achieved better constants 
at the expense of uglier technicalities. 

Proposition 12. Let n, £, and u be integers with n > 4(£ + u). Fix L C {0, ... ,1 — 1} and U C 
{n — u, . . . , n — 1}. Suppose that R is a uniformly randomly chosen subset of {£,..., n — u — 1}, 
and set A := L U R U U. Then the probability that 

{ — (n — £ — u),...,n — £ — u} C A — A 

is greater than 1 - 4(1/2)I L H U I - (n/2)(3/4)("-^") /3 . 

Proof. By the symmetry of A — A about and the fact that G A — A for any nonempty 
set A, it suffices to show that A — A contains {1, . . . , n — £ — u). We employ the crude 
inequality 

n—l—u 

P [{1, . . . ,n — £ — u} A — A] < Yj W \ k ^ A - A \ 

k=l 

< £ F[k<£R-R}+ £ P [jfc £ A - A] . 

l<k<n/2 n/2<k<n-£-u 
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The first sum can be bounded using Lemma llOl with a = I and b = n — u; it is here that 
we use the hypothesis n > 4(£ + u), to guarantee that every k in the range 1 < k < n/2 
satisfies k < 2{n — i — u)/3. We obtain 

£ F[k£R-R] < %(l) {n - e - u)/3 . 

\<k<n/2 

The second sum can be bounded using Lemma [6j yielding 

£ p [kt a - a] < "xf (i) m+w^n-i-u-k = 4( iy 

n/2<k<n—£—u fc=— oo 

Therefore P [{-(n - 1 - u), . . .,n - 1 - u} A - A] is bounded above by 4(1/2) l L H u l + 
(n/2) (3/4)(" _ ^ _ ") / ' 3 / which is equivalent to the statement of the proposition. □ 



3 Proof of Theorem [T] 

In this section we show that the collections of sum-dominant sets, difference-dominant 
sets, and sum-difference-balanced sets all have positive lower density. Our strategy is to 
fix the "fringes" of a subset of {0, 1, 2, . . . , n — 1} (that is, stipulate which integers close to 
and n — 1 are and are not in the set) in a way that forces the set to have missing differences 
(or sums). We then use the probabilistic lemmas of the previous section to show that for 
many sets with the prescribed fringes, all other sums (or differences) will be present. We 
have not attempted to optimize the constants appearing in the following three theorems, in 
part because the previous section would have become even more technical and ugly, and in 
part because we were unlikely to have come close to the true constants (see Conjecture [THl 
below) in any event. 

We begin by showing that a positive proportion of sets are sum-dominant. Here, 
choosing appropriate fringes is most non-trivial, compared to the two theorems that fol- 
low. 

Theorem 13. For n > 15, the number of sum-dominant subsets of {0,1,2, ... ,n — 1} is at least 
(2 x Kr 7 )2". 

Proof. First, note that the bound (2 x 10~ 7 )2" is less than 1 for 15 < n < 22; the existence 
of the single sum-dominant set {0, 2, 3, 4, 7, 11, 12, 14} is enough to verify the theorem in 
that range. Henceforth we can assume that n > 23. 

Define L := {0,2,3,7,8,9,10} and U := {n - ll,n - W,n - 9,n - 8,n - 6,n - 3,n - 
2, n — 1}. We show that the number of sum-dominant subsets A C {0,1,2,. . . ,n — 1} 
satisfying A n {0, . . . , 10} = L and A n {n - 11, . . . ,n - 1} = U is at least (2 x 10" 7 )2' 1 . For 
any such A, the fact that U — L does not contain n — 7 implies that A — A contains neither 
n — 7 nor — in — 7); since A — A C { — (n — 1), . . . , n — 1}, we see that 

\A- A\< 2n-3. 

Therefore it suffices to show that there are at least (2 x 10~ 7 )2" sets A, satisfying A n 
{(),..., 10} = Land An {n - 11,..., n - 1} = U, for which \A + A\ > 2n-2. 
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For any such A, we see by direct calculation that A + A contains 

L + L = {0,...,20}\{1}, 
L + U= {n-ll,...,n + 9}, 
U + U= {2» -22,..., In -2}. 

In particular, if 23 < n < 32 then A + A automatically equals {0, . . . ,2n — 2}\{1}, giving 
| A + A\ = 2n — 2; the number of such A is exactly 2"~ 22 > (2 x 10~ 7 )2", since there are 
n — 22 numbers between 11 and ft — 12 inclusive. 

For n > 33, Proposition [8] (applied with £ = r = 11) tells us that when A is chosen 
uniformly randomly from all such sets, the probability that A + A contains {21, . . . , n — 
12} U {n + 10, . . . , 2n - 23} is at least 



In other words, there are at least 2 ,! ~ 22 • 119/128 > (2 x Kr 7 )2" such sets A. For all these 
sets, we see that A + A again equals {0, ... ,2ft — 2} \ {1}, and hence all such sets are sum- 



The next two theorems carry out a similar approach to showing that a positive propor- 
tion of sets are difference-dominant or sum-difference-balanced. These two results appeal 
to the serviceable but crude Lemma [101 and consequently the constants that appear, as 
well as the computation needed to take care of smaller values of n, are likewise far from 
optimal. 

Theorem 14. For n > 4, the number of difference-dominant subsets of {0, 1, 2, ... ,n — 1} is at 
least 0.0015 • 2 n . 

Proof. The bound can be verified computationally for small n: we have computed by ex- 
haustive search for n < 27 the number of difference-dominant subsets {0, 1, 2, ... ,n — 1} 
that contain both and n — 1. Counting just these sets and their translates is enough to 
prove this theorem for n < 39. Henceforth, we assume that n > 40. 

Define L := {0, 2, 3} and U := {n — 2, n — 1}. We show that the number of difference- 
dominant subsets A C {0, 1, 2, ... ,n — 1} satisfying A n {0, 1, 2, 3} = L and A n {ft — 2, n — 
1} = lJis at least 0.0015 • 2". For any such A, the fact that L + L does not contain 1 implies 
that A + A does not contain 1, and so | A + A\ < 2n — 2. Therefore it suffices to show that 
there are at least 0.0015 • 2 n sets A, satisfying A n {0, 1, 2,3} = L and All {n - 2,n - 1} = 
U, for which \ A - A\ = In - 1. 

For any such A, we see by direct calculation that A — A contains 

(L - U) U (U - L) = {- (ft - 5), . . . , - (ft - 1) } U {ft - 5, . . . , ft - 1}. 

Furthermore, Proposition [12] (applied with i = 4, u = 2, and ft > 24) tells us that when 
A is chosen uniformly randomly from all such sets, the probability that A — A contains 



1 - 6(2-l L l + 2-l u l) = 1 - 6(2~ 7 + 2~ 8 ) 



119 
128 



dominant. 



□ 
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{ — (n — 6), . . . , n — 6} is at least 

nsM+M /„\/3\'"-'-' ,} ' 3 /1\ 5 /„V3^"- 6)/3 

l -' 2 "2 4 =1 " 4 2 " 2 



8n /3 

TV I 



n/3 



As a function of n, this expression is increasing for n > 11, and at n = 40 its value is larger 
than 0.107536. In other words, there are at least 2"~ 6 • 0.107536 > 0.0015 • 2" such sets A. 
For all these sets, we see that A — A equals { — (n — 1), . . . ,n — 1}, and hence all such sets 
are difference-dominant. □ 

Theorem 15. For n > 1, £/ze number of sum-difference-balanced subsets of {0, 1, 2, ... , n — 1} is 
fli/eflsf (2 x 10- 5 )2". 

Proof. The bound can be verified computationally for small n: for n < 27 we have com- 
puted the exact number of sum-difference-balanced subsets of {0, 1,2, . . . ,n — 1} that con- 
tain both and n — 1. Counting only these sets and their translates proves the theorem for 
n < 42. Henceforth, we assume that n > 43. 

Define L := {0, . . . , 5} and U := {n — 6, . . . ,n — 1}. We give a lower bound for the 
number of sum-difference-balanced subsets A C {0, 1, 2, . . . , n — 1} satisfying L U U C A; 
in fact we show that the number of such subsets with | A + A| = | A — A\ = 2n — 1, the 
maximum possible size, is at least (2 x 10~ 5 )2". Combining PropositionslBland 1121 (applied 
with I = u = 6), we find that when A is chosen uniformly randomly from all such sets, 
the probability that both A + A and A — A are as large as possible is at least 

1 - (<rW+ ^ ) - 4 (>) mB| -(-)(») , ^ / '-»-*(»)- / '. 

This function is increasing for n > 1 and takes a value larger than 0.131232 when n = 43. 
In other words, there are at least 2 ,! ~ 12 • 0.131232 > (2 x 10~ 5 )2" such sets A. For all these 
sets, we see that A + A equals {0, . . . , In — 2} and A — A equals {— (n — 1), . . . ,n — 1}, and 
hence all such sets are sum-difference-balanced. □ 



4 Average values 

In this section, we prove Theorem [3] by calculating the average values of | S — S \ and 
|S + S| as S ranges over an arithmetic progression P of length n. Since the problem is 
invariant under dilations and translations, it suffices to prove the theorem in the case 
P = {0,1,2,. ..,n-l}. 

We begin by addressing the average cardinality of the sumset S + S. In fact, we can 
give an exact formula for the average size of the sumset, or equivalently for the sum of the 
sizes of all sumsets as S ranges over subsets of {0, 1,2, . . . ,n — 1}. The reason we can do so 
is essentially because of the linearity of expectations of random variables. 
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Theorem 16. For any positive integer n, we have 

E | S + S|=2»(2„-11) + ( 19 ' 3( *: 1)/2 ' ,fn,S ° M - (1) 
sc{b~n-i} [ll-3 n/z , ifniseven. 



Proof. We begin with the manipulation 

2n-2 

E l s + s l= E E i=E E i 

SC{0,...,n-l} SC{0,...,«-l}0<fc<2»-2 fc=0 sc{0,...,rc-l} 

/ceS+S )ceS+S 

2n-2 2n-2 

= E 2"P [fc G S + S] = 2 n (2n — 1) — 2" E F [* £ S + S] . (2) 

k=0 k=0 

We suppose that n = 2m + 1 is odd, the case where n is even being similar. We begin by 
considering only the lower half of possible values for k. By Lemma we have 

n— 2 m—1 m—1 

£ p [k £ s + s] = E F [2/ £ s + s] + E F [2; + 1 £ s + s] 

fc=0 ;=0 ;'=0 

m—1 . m—1 

= Ei(!) ; +E(!) ;+1 = 5(i-(!) m ). 

;=0 ;=0 

By the symmetry of S + S about n — 1, the same calculation holds for E^"; 2 P [Jfc £ S + S]. 
Therefore, appealing to Lemma [7] again for k = n — 1 = 2m, 

E p M s + s] = 5(1 - + i(!) m + 5(1 - = io - f (1) ( - 1)/2 . 

Inserting this value into the right-hand side of equation (|2]) establishes the lemma for odd n. 
A similar calculation gives the result for even n. □ 

While it is possible to write down an exact formula for the average size of the differ- 
ence set S — S as S ranges over all subsets of {0, 1, 2, . . . , n — 1}, the formula would be far 
too ugly to be of use. We prefer in this case to present a simple asymptotic formula with a 
reasonable error term. 

Theorem 17. For any positive integer n, we have 

£ \S-S\ =2 n (2n-7)+0(n6 n/3 ). (3) 

SQ{0,.. .,«-!} 
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Proof. As in the proof of the previous theorem, we have 

E \s-s\= E E 1= E E i 

SC{o,...,n-l} SC{o,...,n-l}-(n-l)<fc<K-l fc=-(n-l) SC{o,...,n-l} 

A:GS-S fceS-S 

n-1 

= £ 2 n F[keS-S] 

k=-(n-l) 

n-1 

= 2"(2n-l)-2" X] P[fc£S-S] 

fc=-(n-l) 
?j — 1 

= 2"(2n-l)-l-2' !+1 ^P[it^S-S] / (4) 

(t=l 

the last equality following from the symmetry of S — S around and the fact that is in 
S — S for nonempty S. From Lemma [TT1 we have 

\n/2]-l \n/2]-l 

E p[**s-s]< E (ir /3 <«(i) n/3 

Jc=l *:=1 

and 

E PMs-s]= E (|)"- fc = 3(i-(l)"-^ /21 ), 

)c=[n/2] fc= fn/2] 

which combine to give 

EPMS-S]=3 + 0(n(f) n/3 ). 

fc=i 

Inserting this expression into the right-hand side of equation (JU) establishes the theorem. 

□ 

Examining the derivations of these two theorems reveals that it really is the commuta- 
tivity Si + S2 = S2 + Si that causes the difference in the average sizes of S + S and S — S: a 
typical potential element of S + S has only about half as many chances to be realized as a 
sum as the corresponding potential element of S — S has at being realized as a difference. 
To further emphasize this observation, we note that if the single set S is replaced by two 
sets S and T, the disparity disappears: for an arithmetic progression P of length n, we have 

^EElS-T|~^EElS + T|~2»-7. 

scptcp ^ scptcp 

5 Sets with prescribed imbalance between sums and differences 

In this section we prove that the range of possible values for |S + S| — \S — S \ is all of Z. 
Furthermore, as asserted in TheoremHl our constructions show that for every integer x, we 



MANY SETS HAVE MORE SUMS THAN DIFFERENCES 



13 



can find a subset S of {0, ... , 17|x|} such that |S + S\ — \S — S\ = x. As one might expect 
from the foregoing discussion, the case where x is negative is easiest. 

Negative values of x. For any integer x < 0, set S x = {0, . . . , \x\ + 1} U {2\x\ + 2}. Then 
S x + S x = {0 / l/.../3|x|+3}U{4|x|+4} while S x - S x = {-(2\x\ + 2), . . . ,l\x\ + 2}, 
whereupon 

\S X + S X \ - \S X -S X \ = (3\x\ + 5) - (4|jc| +5) = -|x| = x. 

Even more generally, take any integer n > \x\ +2 and set S = {0, . . . ,n — 1} U {n + \x\}. 
Then S + S = {0, . . . ,2n + \x\ - 1} U {2n + 2\x\} and S - S = {-{n + \x\), . . . ,n + \x\), 
which again yields |S + S| — \S — S\ = x. 

We turn now to nonpositive values of x. Our general construction works for larger 
values of x, but we need to handle a few small values of x individually. 

A few special cases. For a few small values of x, we find suitable sets S x simply by computa- 
tion: if we set 

5 :=0 

51 := {0,2,3,4,7,11,12,14} (5) 

5 2 := {0,1,2,4,5,9,12,13,14,16,17} 

S 4 := {0,1,2,4,5,9,12,13,17,20,21,22,24,25}, 

then in each case it can be checked that \S X + S x \ — \S X — S x \ = x. In fact, these exam- 
ples are all minimal in the sense that the diameter max S — min S is as small as possible. 
(Vishaal Kapoor and Erick Wong confirmed computationally the fact that S4 is the unique, 
up to reflection, set of integers of diameter at most 25 for which the sumset has four more 
elements than the difference set. We note that Pigarev and Freiman [9] gave the slightly 
larger example S^ = {0, 1, 2, 4, 5, 9, 12, 13, 14, 16, 17, 21, 24, 25, 26, 28, 29}, which also satis- 
fies + s^i - |s; - = 4.) 

In fact, these diameter-minimal examples are unique, up to reflection, except for Si: 
there are two other subsets of {0, ... , 14}, namely 

Si = {0,1,2,4,5,9,12,13,14} 

and its reflection, for which the sumset has one element more than the difference set. The 
first set Si has only eight elements, as compared with the nine elements of S[. In fact, 
Hegarty [2] has shown that Si is also the sum-dominant set with the smallest cardinality, 
unique up to dilation, translation, and reflection. On the other hand, there are tantalizing 
similarities among the sets S[, S2, S4, and S' 4 that might admit a clever generalization. 

We note that Ruzsa [11 J claimed that U = {0,1,3,4,5,6,7,10} is sum-dominant, but 
this is incorrect: both U + U and U — U have 19 elements. We also mention the following 
observation of Hegarty: if one sets 

A = S 4 U (S 4 + 20) 

= {0, 1, 2, 4, 5, 9, 12, 13, 17, 20, 21, 22, 24, 25, 29, 32, 33, 37, 40, 41, 42, 44, 45}, 
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then one has \A + A\ = 91 and \A — A\ = 83, providing the statistic log 91/ log 83 = 
1.0208 . . . which is important when using the elements of A as "digits". More precisely, 
considering sets of the form A n = A + bA + b 2 A + • • • + b n ~ 1 A for suitably large fixed b, 
we have \ A n + A n \ = \ A n — A n j 1 ' 0208 '", which is currently the best exponent known. 

For other positive values of x, the basic general construction is an adaptation of the 
subset Si x {0, . . . , k} of Z x Z, embedded in Z itself by a common technique of regarding 
the coordinates as digits in a base-fo representation for suitably large b. In our simple case, 
we can be completely explicit from the start. 

Odd values of x exceeding 1. Let x = 2k + 1 with k > 1. With Si defined as in equation (0, 
set 

S 2k+1 = St + {0,29,58,..., 29k} (6) 
= {0 < s < 29k + U: s = 0,2, 3, 4, 5, 11, 12, or 14 (mod 29)}. 

Then we find that 

S 2 k+i + S 2 jc+i = (Si + Si) + {0,29,58, . . .,58k} 

= {0 < s < 29(2fc + 1) : s ^ 1, 20, or 27 (mod 29)}, 

which reveals that |S 2 / c+ i + Sxi+i \ = 26(2k + 1). On the other hand, 

Sik+i ~ S 2k +i = {-29 (k + |) < s < 29 (k + \) : s ^ -13, -6, 6, or 13 (mod 29)}, 

showing that |S 2) t + i -S 2 fc+i| = 25(2k + 1), and so \S lk+1 + S lk+ i \ - \S 2 k+i -S 2 fc+i| =2k + l 
as desired. 

Even values of x exceeding 4. Let x = 2k with k > 3. With S 2 ; c+ i defined as in equation (O, 
set S 2 k = S 2 jc + i \ {29}. One can check that S 2 ; c — S 2k still equals all of S 2 ; c+ i — S^+i but that 
S 2 jt + Sat = (S 2k+1 + Sat+i) \ {29}. Therefore 

\S 2 k + S 2 fc| — \S 2k — S 2k \ = \S 2k+1 + S 2k+1 \ — \S 2k+ i — S 2 k+i\ — 1 = 2k 

as desired. Notice that S 2k is indeed contained in {0, ... , 17(2/:) } as asserted by TheoremlH 
the closest call being the comparison between max S(, = 101 and 17 • 6 = 102. 

We note that as this manuscript was in preparation, Hegarty [2 . Theorem 9] indepen- 
dently proved that | S + S | — | S — S | can take all integer values f . In fact he proved, extend- 
ing ideas originating in our proof of Theorem [H somewhat more: for each fixed integer f, 
if n is sufficiently large then a positive proportion of subsets S of {0, 1, 2, . . . , n — 1} satisfy 

\s + s\ - \s-s\ = t. 

6 Analysis of data 

Theorem [3] gave the expected values of |S + S| and |S — S|, which seems most naturally 
phrased as saying that the expected number of missing sums is asymptotically 10, while 
the expected number of missing differences is asymptotically 6. One is naturally led to 
enquire as to the details of the joint distribution of these two quantities. Let c„(x, y) be the 
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number of subsets of {0, 1,2, . . . ,n — 1} with \ S + S| = x and |S — S| = y. Figured] shows a 
square centered at (x,y) G Z 2 whose area is proportional to log(l + C2s{x,y)). Also shown 
are the lines x = 2~ 25 yj s \S + S| (the average size of a sumset), y = 2~ 25 £ s |S — S| (the 
average size of a difference set), and y = x. 




10 20 30 40 50 



Figure 1: The size of the square centered at {x,y) indicates the number of subsets of 
{0,...,24} with (|S + S|, |S - S|) = {x,y). 

Figure |2] shows the observed distribution of X := 2n — 1 — |S + S| (that is, the number 
of missing sums) for three million randomly generated subsets of {0, 1, 2, . . . ,999}. For 
example, the histogram shows that approximately 1.4% of these subsets S have the largest 
possible sumset S + S = {0, . . . , 1998}, approximately 2.1% of them have exactly one ele- 
ment of {0, ... , 1998} missing from their sumsets, and so on. The histogram is essentially 
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identical to one generated from the complete data set for subsets of {0, . . . , 26}. 




2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 



Figure 2: The observed frequencies of the number of missing sums 

Notice that there is a "divot" at the top of the histogram: the observed frequencies 
of sets missing exactly 6 or 8 sums are both larger than the observed frequency of sets 
missing exactly 7 sums. In fact, the frequency for every even value seems to be larger than 
the average of its two neighbors, while the opposite is true for the frequencies of the odd 
values; in other words, the piecewise linear graph that connected the points at the tops of 
the histogram's bars would alternate between being convex and concave. 

Recall that the missing sums are typically very near the edges of the interval of possible 
sums. In particular, the missing sums for a subset S of {0, . . . ,999} tend to be near either 
or 1998, and are therefore so far apart that their numbers are independent. Therefore the 
distribution shown in Figure |2] is the sum of two independent, identically distributed (by 
symmetry) random variables that count the number of missing sums near one end. This 
is also essentially the same distribution as the number Y of missing sums in randomly 
chosen (infinite) subsets A of the nonnegative integers {0,1,...}. That is, if Y\,Yi are 
independent with the same distribution as Y, then X and Y\ + Yi have approximately the 
same distribution (for large n). 

At first one might think, then, that the parity phenomenon in Figure|2]is caused by that 
distribution being the sum of two independent copies of a simpler distribution. However, 
in this latter distribution (the first histogram in Figure HJ), the disparity between odd and 
even values is even more apparent. 

Fortunately, the phenomenon here is easy to analyze: if is not in our randomly chosen 
subset of {0, 1, . . . }, then there are automatically 2 missing sums, namely and 1, and the 
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rest of the random subset can be shifted downwards by 1 to find the distribution of other 
missing sums: 

[k/2\ 

P [y = jfc] = £ p [Y = k - 2i | minA = 0] 2"\ 

In other words, there is a yet more fundamental distribution (the second histogram in Fig- 
ure|3]), given by the number of missing subsums in a randomly chosen subset of {0, 1, . . . } 
containing 0. For example, that histogram shows that if a subset S of {0, 1, . . . } containing 
is chosen at random, there is about a 23.6% chance that S + S = {0, 1, . . . }. 




2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 2 4 6 8 10 12 14 16 18 20 22 24 



Figure 3: The observed frequencies of the number of missing sums for randomly chosen 
subsets of {0, 1, . . . }, with no restriction (left) and with the restriction that belongs to the 
set (right) 

The parity discrepancy seems to be absent in this last distribution, suggesting that 
it should be the focus of further analysis; the two more complicated preceding distribu- 
tions can be reconstructed from suitable manipulations of this most fundamental one. The 
histogram suggests the existence of a function f(x), smooth and decaying faster than expo- 
nentially, such that the probability of a randomly chosen subset of {0, 1, ... } that contains 
missing exactly n subsums is f{n). 

It would of course be interesting to do a similar empirical analysis for the distribu- 
tion of the number of missing differences; perhaps their joint distribution could even be 
reduced to a simpler one using similar observations. 

7 Conjectures and open problems 

We have already conjectured, in Conjecture |2j that the limiting proportions of difference- 
dominant, sum-difference-balanced, and sum-dominant subsets of {0,1,2, ... ,W — 1} ap- 
proach nonzero limits as n tends to infinity. (As long as the limits do in fact exist, Theorem[TJ 
shows that they are necessarily nonzero.) Figure 0] shows the observed proportions, for 
n < 27, of the subsets of {0, 1, 2, . . . , n — 1} that are difference-dominant, sum-difference- 
balanced, and sum-dominant, respectively. Note particularly that each graph is monotonic 
in n, supporting our conjecture that the limits exist. Using ten million randomly chosen 
subsets of {0, 1, ... , 999}, we estimate: 
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Figure 4: The probability of a random subset of {0, . . . , n — 1} being sum-dominant (top 
graph), difference-dominant (middle graph), or sum-difference-balanced (bottom graph) 

Conjecture 18. Using the notation of Conjecture^ 
p- « 0.93, p + « 0.00045, and p = » 0.07. 



In fact the philosophy behind Theorem 1 suggests somewhat more: a typical subset 
of {0, 1, 2, . . . , n — 1} will achieve virtually all possible sums and differences, and the ones 
that aren't achieved are due to the edges of the subset. Since a positive proportion of sets 
have any prescribed edges, we make the following conjecture. Define 



p j/k := lim (2-"#{S C {0,1,2,..., n 

IS 



1}: 

SI = In 



1-;, \S-S\ =2n-l-fc}), (7) 

assuming the limit exists. Since the different set S — S is symmetric about and thus 
always has odd cardinality, we never have |S — S| = 2n — 1 — k with k odd. Therefore we 
conjecture: 

Conjecture 19. For any nonnegative integers j and k with k even, the limiting proportion p^ 
defined above in exists and is positive; furthermore, 



E E n 



i. 



j=0 k=0 
k even 
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Remark. Given Theorem |3l it seems reasonable to conjecture also that 

OO OO 00 OO 

E E k Pj* = 6 and £ E JPj* = 10 - 

;=0 k=0 j=0 k=0 

k even k even 

For any particular pair j,k, if a single finite configuration of edges could be found that 
omitted exactly ; possible sums and k possible differences, the methods of this paper would 
then show that pj^ > (technically that the analogous expression with lim inf^eo in place 
of lim, WO o is positive). 

The last remark suggests as well the following open problem, for which a simple proof 
might exist, though we have not been able to find one. 

Conjecture 20. For any nonnegative integers j and k with k even, there exists a positive integer 
n, and a set S C {0,1,2, ... ,n — 1} with € S and n — 1 S S, such that \S + S\ = 2n — 1 — ; 
and | S — S \ =2n — 1 — k. 

Hegarty points out that his methods from [2| can establish both Conjecture Il9l and Conjec- 
ture l20l in the case j > k/2. 

We know (TH5) that essentially all subsets of {0, 1,2, ... ,n — 1} of cardinality 0(n 1/4 ) 
are Sidon sets and hence difference-dominant sets. More generally, we can show (perhaps 
in a sequel paper) that if m = o(n 1/2 ), then almost all subsets of {0, 1,2, . . . ,n — 1} of 
cardinality m are difference-dominant sets. 

This result may indicate the presence of a threshhold. Set p n to vary with n, and 
define n independent random variables X,, with X, = 1 with probability p„. This defines 
a random set A := {i 6 {0, 1, 2, . . . , n — 1} : X, = 1}. The observations above can then be 
rephrased in the following way: if p n = o(n~ 1/ ' 2 ), then A is a difference-dominant set with 
probability 1 (as n — > oo). We showed in this article that if p n = 1/2, then A is a sum- 
dominant set with positive probability (as n — > oo), and our result is easily extended to 
p n = c > 0. An important unanswered question is "Which sequences p n generate a sum- 
dominant set with positive probability?" Perhaps our last conjecture captures the correct 
notion: 

Conjecture 21. For each n > 1, let X n/ o,X n/ i, . . . ,X n/ „_i be independent identically distributed 
random variables, and set A n := {i: < i < n,X Hi i = 1}. If both \A n \ — > oo and \A n \/n — > 
with probability 1, then the probability that A n is difference-dominant also goes to 1. 
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